CSA Social Marketing: News Clusters
Clustering News Topics
“Which topics are alike in terms of their audiences?”
Optimal Number of Clusters
K Means Clusters in PCA Space
PC & Quadrant Interpretations
These PC and quadrant interpretations reflect relations between the news topic clusters: what are the connecting threads between news topics in different parts of the news space? What dimensions differentiate between news topic clusters in this news space?
- PC1 (Psychological Distance)
- ⬅️ Broad Systemic Issues
- ➡️ Immediate/Current Civic Events
- PC2 (Issue Scale)
- ⬆️ Local scale
- ⬇️ Large scale
|
↖️ Community & social infrastructure |
↗️ Civic & local daily life |
|
↙️ Analytical & global systems |
↘️ Political & public affairs |
.
Note: some clusters will cross quadrants. See Cluster 5 for example. Transportation and traffic being clustered together suggests these two topics are more related to each other than to other topics. But the transportation topic sits in the Social infrastructure/broad issue/local scale quadrant while traffit sits in the Everyday/Local scale quadrant. So within this cluster specifically about movement-related news topics, some are more large scale than others.
Similarly within Cluster 2, Economy is closer to the analytical/global systems quadrant than its other connected topics (immigration, politics/gov, global affairs). Same thing with Climate and Environment compared to Housing/Homelessness and Social Justice within Cluster 1.
Biplot comparing to news gratifications
PC & Quadrant Interpretations (for arrows)
These PC and quadrant interpretations do not necessarily account for or need to exactly match the previous PC interpretations which are more about the relations between news topics. These interpretations are more about the gratification arrows (what is news for? what do people hope to get out of it?). The critical insight here is how these gratifications orient when laid on top of the news topic space.
You can compare to the previous news topic-based PC dimensions to see how these gratification-based dimensions relate to those. So while the dimensions themselves don’t have to match, there will likely be functional or conceptual relationships and symmetries between the news topic PCs and these gratification PCs (if the data & analysis is capturing something meaningful).
- PC1 (Function)
- ⬅️ Uplift/Improvement
- ➡️ Pragmatic
- PC2 (Modality)
- ⬆️ Affective
- ⬇️ Cognitive/Intellectual
|
↖️ News as empowerment |
↗️ News as daily tool |
|
↙️ News as personal enrichment |
↘️ News as moral-intellectual information |
Biplot comparing to news characteristics
PC and Quadrant Interpretations (for arrows)
These PC and quadrant interpretations are about the characteristic arrows (what qualities of the news do people seek/like?). How do these news characteristic preferences orient when laid on top of the news topic space.
- PC1 (Epistemic)
- ⬅️ Opinionated/Entertaining
- ➡️ Factual/Informational
- PC2 (Tone)
- ⬆️ Affective
- ⬇️ Analytical
|
↖️ Opinionated engaging news |
↗️ Purely practical news |
|
↙️ Analytical/perspective news |
↘️ Serious news |
.
Note: no arrows point to upper-right quadrant, which means none of the news characteristics are positively associated with the combination of dimensions: factual/information + affectvive. Suggests everyday news is viewed as practical but not tied to distinctive content qualities — neither exciting nor intellectually rich, just plain useful information — high in relevance, low in personality.
Biplot comparing to demographics
Note: I applied a arrow filter to only show those with a correlation over .1 (there would be 44 arrows plotted otherwise)
PC and Quadrant Interpretations (for arrows)
These PC and quadrant interpretations are about the demographic arrows, how do respondent demographics orient when laid on top of the news topic space.
- PC1 (Age/politics)
- ⬅️ Younger, diverse progressives
- ➡️ Older, whiter, traditionalists
- PC2 (Gender/income/education)
- ⬆️ Less well-off
- ⬇️ More well-off
|
↖️ Less Well Off Progressives |
↗️ Less Well Off Traditionalists |
|
↙️ Well Off Progressives |
↘️ Well Off Traditionalists |
Clustering news gratifications
“Which gratifications are alike in terms of their needers?”
Optimal number of clusters
K Means
PC and Quadrant Interpretations
- PC1 (Orientation)
- ⬅️ Self-enhancement/Affirming
- ➡️ Civic/Relational
- PC2 (Function)
- ⬆️ Affective/Moral
- ⬇️ Cognitive/Pragmatic
|
↖️ Self Uplift |
↗️ Communal Connection |
|
↙️ Intellectual Affirmation |
↘️ Civic Pragmatism |
Biplot comparing to news topics
PC and Quadrant Interpretations (for arrows)
- PC1 (Psychological Distance)
- ⬅️ Broad systemtic issues
- ➡️ Immediate/Current Civic Events
- PC2 (Function)
- ⬆️ Moral?
- ⬇️ Practical??
|
↖️ Systemic Issues News |
↗️ Current Events/Immediate News |
|
↙️ - |
↘️ - |
Biplot comparing to news characteristics
PC and Quadrant Interpretations (for arrows)
- PC1 (Psychological Distance)
- ⬅️ Partisan/Entertaining
- ➡️ Factual
- PC2 (Function)
- ⬆️ ?
- ⬇️ ?
|
↖️ Partisan/Entertaining News |
↗️ Factual News |
|
↙️ - |
↘️ - |
.
Partisan/Entertaining news = Uplift/Self Affirming; Factual news = Civic Moral
Biplot comparing to demographics
PC & Quadrant Interpretations (for arrows)
- PC1 (Age)
- ⬅️ Millenial
- ➡️ Boomer
- PC2 (Societal Engagement)
- ⬆️ High Engagement
- ⬇️ Low Engagement
|
↖️ Highly engaged Millenials |
↗️ Highly engaged Boomers |
|
↙️ Low engaged Millenials |
↘️ Low engaged Boomers |
.
Millenials = identity-driven needs; Boomers = civic sense making needs
High engagemnet = moral needs; Low engagement = practical needs
Clustering news characteristics
“Which characteristics are alike in terms of their admirers?”
Optimal number of clusters
K Means
PC and Quadrant Interpretations
- PC1 (Information type)
- ⬅️ Partisan/Personality
- ➡️ Credible
- PC2 (Worldview)
- ⬆️ Familiar/Comfortable
- ⬇️ Challenging/Novel
|
↖️ Comfortable & Partisan |
↗️ Credible and Familiar |
|
↙️ Challenging and Entertaining |
↘️ Novel and Credible |
Biplot comparing to news topics
PC and Quadrant Interpretations (for arrows)
- PC1 (Information type)
- ⬅️ Entertaining
- ➡️ Trustworthy
- PC2 (Psychological Distance)
- ⬆️ Immediate/Current Civic Events
- ⬇️ Broad Systemic Issues
|
↖️ Comfortable & Partisan |
↗️ Credible and Familiar |
|
↙️ Challenging and Entertaining |
↘️ Novel and Credible |
.
Respondents associate most news topics with fact-based, credible qualities rather than partisan or entertaining, with exception of sports. The few pointing up (weather, local, politics) are more habitual current events, the rest pointing down are more challenging.
Biplot comparing to news gratifications
PC and Quadrant Interpretations (for arrows)
- PC1 (Orientation)
- ⬅️ Self-Affirming
- ➡️ Civic/Relational
- PC2 (Worldview)
- ⬆️ Validates
- ⬇️ Challenges
|
↖️ Validating news |
↗️ - |
|
↙️ Challenging-Moral news |
↘️ Self-Enhancing news |
Biplot comparing to demographics
PC1 divides traditional trust-oriented audiences (right) from expressive, personality-based news consumers (left)
PC2 separates comfort-seeking, familiar news users (top) from those who seek new or challenging perspectives (bottom)
Exploring different arrow filters
For these experiments, I am looking for a method that can reliably identify the most important arrows to plot on the biplots. We don’t want to show noise on the plots, and more simplified plots that only show most statistically reliable information will be more useful to our audience and for interpretation.
For this experiment, I will do the following tests on the same dataset to see how the results change:
- Correlation test p values:
- Simple and independent univariate tests of the correlations between each arrow and PC1 and PC2.
- Pro: fast and easy to implement
- Con: ignores multivariate relations (between variables and PC1/PC2) and issues with p values (e.g., large samples can result in significant p values for very small correlations)
- Regression-based p values:
- The PCs are regressed on the data from each arrow independently. The p values for PC1 and PC2 regressors are saved and filtered for significance
- Pro: can account for multivariate PC relations than univariate tests
- Con: Same issues with p values (small relationships can be significant with large samples), direction of regression is arbitrary (no causal claims can be made).
- Bootstrap stability:
- Respondents are randomly resampled (with replacement) N times, each time recomputing the arrow correlations with PC1 and PC2. Check for the consistency in arrow direction and magnitude (based on a minimum correlation threshold).
- Pro: measures consistency across different portions of the same sample
- Con: slow and computationally expensive
- Permutation test:
- PC scores are randomly permuted (shuffled) N times and correlated with the arrows, this creates a correlation null distribution of what correlation values can be expected from the data if it was meaningless noise (due to shuffling the scores around). Observed correlations (from the actual data) that are far enough away from the null distribution are statistically meaningful.
- Pro: controls for multiple tests, statistically most sound option for inference
- Con: also slow and computationally expensive
- Vector length threshold:
- Arrows are filtered explicitly by a correlation threshold decided by researcher.
- Pro: simple and easy to implement, researcher decides what size correlation is meaningful (compared to p value-based filtering which might keep small but significant correlations)
- Con: can be arbitrary, not statistically sound
- Dual PCA:
- Instead of plotting each arrow independently, we can reduce them to their PC1 and PC2 components and plot those instead. So there are two PCAs, one on the main data and one on the second data (arrows) and plotting PC arrows on PC space allows us to compare how the two datasets’ dimensionalities are related. In this way, it’s not just another option, but actually another way to analyze the data
- Pros: Can directly relate more latent/meaningful clusters that provide more concise summary of the relations between different questions
- Cons: Results are more abstracted from the raw data, so it might be harder for audiences to understand (but can be figured out with good storytelling). Might also over-summarize if the individual item arrows are more nuanced and interpretable.
- Dual Combo PCA:
- Same as dual PCA, but plots the individual arrows as well as the PC arrows on the same plot so you can see the full details.
- Pros: Can see the individual arrow directions which may not exactly match the summarized PC arrows
- Cons: Could be a messy plot
The main dataset will be the news topics. The second dataset will be news gratifications (13 total arrows)
Simple correlation tests
Filtered to 13 arrows.
Results
Kept all arrows since all had very small p values (<.0001) on at least one of the components (because of sample size).
p_PC1
Aligns with my own values and point of view 6.964828e-01
Exposes me to other viewpoints that challenge my own 8.044110e-02
Gives me something to talk about 1.597246e-02
Helps me become a better person 4.311683e-26
Helps me empathize with others 2.098118e-25
Makes me better prepared to participate in public life 2.195895e-01
Makes me feel empowered 3.153366e-77
Makes me feel like an expert in something 1.940650e-57
Makes me feel like I’m part of a community 2.145107e-03
Makes me feel prepared for the day 1.924739e-30
Makes me feel seen and heard 3.774850e-83
Makes me feel smart 2.272112e-56
Makes me hopeful 8.508281e-31
p_PC2
Aligns with my own values and point of view 3.034807e-07
Exposes me to other viewpoints that challenge my own 1.414366e-41
Gives me something to talk about 2.850708e-02
Helps me become a better person 4.341821e-04
Helps me empathize with others 9.796274e-01
Makes me better prepared to participate in public life 2.248441e-04
Makes me feel empowered 4.141251e-05
Makes me feel like an expert in something 2.603076e-07
Makes me feel like I’m part of a community 8.427720e-05
Makes me feel prepared for the day 2.109493e-09
Makes me feel seen and heard 1.746748e-01
Makes me feel smart 1.621236e-13
Makes me hopeful 8.214912e-05
Regression filter
Filtered to 13 arrows.
Results
Same as the correlation test, kept all arrows due to small p values on at least one of the components
Variable p_PC1
1 Aligns with my own values and point of view 4.671346e-01
2 Exposes me to other viewpoints that challenge my own 7.843474e-03
3 Gives me something to talk about 1.050447e-02
4 Helps me become a better person 3.686136e-25
5 Helps me empathize with others 1.637149e-25
6 Makes me better prepared to participate in public life 1.407833e-01
7 Makes me feel empowered 1.841598e-75
8 Makes me feel like an expert in something 1.867985e-55
9 Makes me feel like I’m part of a community 8.491964e-04
10 Makes me feel prepared for the day 1.034545e-28
11 Makes me feel seen and heard 2.744509e-84
12 Makes me feel smart 1.465402e-53
13 Makes me hopeful 2.641013e-32
p_PC2
1 2.500893e-07
2 1.899959e-42
3 1.860123e-02
4 4.338781e-03
5 4.776839e-01
6 1.581858e-04
7 3.129902e-03
8 2.948656e-05
9 3.420347e-05
10 1.230449e-07
11 7.659179e-03
12 1.147359e-10
13 2.195292e-06
Bootstrap filter
Running 500 bootstraps...
..........
Filtered to 8 arrows.
Results
This time it kept only 8 arrows, the bootsrap filter removed the arrows that tended to be shorter (meaning the arrows less correlated with PCs), while keeping the longer arrows (meaning the arrows more correlated with PCs). Exactly the kind of patterns we would expect when filtering for importance.
In the scores below, you can see that these arrows all have stability of >.8 for at least one component. Those that have high stability on both means that arrow is meaningfully related to both PCs.
stability_PC1
Exposes me to other viewpoints that challenge my own 0.000
Helps me become a better person 0.836
Makes me feel empowered 1.000
Makes me feel like an expert in something 1.000
Makes me feel prepared for the day 0.970
Makes me feel seen and heard 1.000
Makes me feel smart 1.000
Makes me hopeful 0.958
stability_PC2
Exposes me to other viewpoints that challenge my own 1.000
Helps me become a better person 0.000
Makes me feel empowered 0.000
Makes me feel like an expert in something 0.000
Makes me feel prepared for the day 0.000
Makes me feel seen and heard 0.000
Makes me feel smart 0.002
Makes me hopeful 0.000
This seems like a good filtering method moving forward.
Permutation test
Running 1000 permutations...
.
Filtered to 13 arrows.
Results
The permutation test kept all arrows, might be related to the sample size issue where the null distribution of correlations is tightly bound to 0, so small correlations can easily be far enough away from the null distribution.
Vector length threshold
Filtered to 9 arrows.
Results
Like the bootsrap method, filtering by minimum correlation size that is considered meaningful also removes the smaller arrows that are less correlated with PCs. However, the threshold is considered highly arbitrary. Bootstrap method is more robust.
Dual PCA
Filtered to 4 arrows.
Results
Interpreting the meta-PC plot requires care, since it overlays two principal-component (PC) spaces—the news-topic PCA and the gratification (needs) PCA—on the same axes.
Both gratification-space PC arrows point mainly horizontally, indicating that their strongest association is with PC1 of the news-topic space—the Systemic ↔︎ Immediate/Current dimension. In other words, the correspondence between the two spaces is largely one-dimensional.
People whose news interests emphasize immediate, civic, or community-focused topics (e.g., weather, local issues, traffic, politics) also tend to have pragmatic and relational news needs. Conversely, those who focus more on systemic or abstract issues (e.g., education, science, global affairs) tend to express ideological, moral, or self-affirming needs.
The gratification arrows are not perfectly horizontal—they tilt diagonal slightly—suggesting a minor relationship with PC2 of the news-topic space (the Local ↔︎ Large-scale contrast). This weak diagonal component implies that people with civic-relational needs lean slightly toward local-scale news, whereas those with self-enhancing or affirming needs lean slightly toward large-scale topics. However, this vertical component is small, so the association of the Local–Large-scale distinction on gratification needs is limited.
Dual Combo PCA
Filtered to 17 arrows.
Results
I tried to make it clear as possible, there’s too many arrows and labels that it is hard to see what is going on if in one plot. The best I could figure out is to facet the plot and separate the individual from the meta-PC arrows. If we ever use this approach, this might be the best way to do so, maybe even sequentially so it’s not a lot at once.
Right now the plot with individual arrows is not filtered for importance, mainly because having all the arrows helps with interpreting the meta-PC arrow space. But I can imagine combining this approach with a filtering method (e.g., bootstrap) to make the individual plot arrow more succinct.
Additional Experiments
Bootstrap filtering on characteristic PCA space
The news topics set has 17 arrows.
Running 500 bootstraps...
..........
Filtered to 5 arrows.
Results
Out of 17 total news topic arrows, the bootstrap filter method kept 5, the longest arrows from the original biplot.
Bootstrap filtering on gratification PCA space
The news topics set has 17 arrows.
Running 500 bootstraps...
..........
Filtered to 10 arrows.
Results
Out of 17 total news topic arrows, the bootstrap filter method kept 10, the longest arrows from the original biplot.
Test dual combo biplot on constrained arrows PCA space
Since it is a weird plot that is harder to interpret because all the arrows are pointing in a similar direction (up), I will see if distilling the news topics down to their PCs might illuminate the underlying relationship between news gratifications and news topics. I will also add the ability to execute a bootstrap filter on the individual arrow plot to help with interpertation. I will also add a scale factor to the meta-PC arrows to account for their level of variance explained of the arrows (i.e., their importance).
Running bootstrap filter on individual arrows...
..........
Filtered to 14 arrows.
Results
This experiment shows that when there is an unclear plot at the individual arrow level, including the meta-PC plot can help uncover true underlying relationships. Although none of the topic arrows point donwards, the meta-PC arrows do. The donward meta arrows aren’t saying “there are individual variables that literally point down,” it’s saying within the set of upward-pointing news arrows, some of the arrows are higher vs lower and this vertical pattern differentiates the most between news topics (PC1 - systemic vs immediate civic topics), and this news topic PC1 relates most with the gratification PC1. Interest in systemic topics is more aligned with affective moral gratifications, interest in immediate/civic event topics is more aligned with cognitive/pragmatic gratifications.
Then within the set of upward pointing news arrows, a smaller pattern exists where some point left and others right, this is the news topic PC2 (local vs. large scale), which ends up being related to both the gratification PC1 and PC2 (since it is diagonal). Interest in large scale topics = interest in affective/moral and self-enhancing/affirming gratifications. Interest in local scale topics = interest in cognitive/pragmatic and civic/relational gratifications. But this set of relations is overall smaller effect than the PC1 patterns above.
The meta-PC space identifies latent contrasts among the individual news topics — capturing how they co-vary rather than how topics individually correlate with gratification PCs.
In this example, the arrows are all constrained upwards, meaning their variance in direction is low, basically one cluster. In this case,
Meta-PC1 = overall average direction of the clustered arrows, will simply point along the shared direction. Meta-PC2 = will capture residual differences among them, the main orthogonal contrast within that cluster, explains less variance though.
When all the arrows point roughly the same way, the meta-PCs mostly describe subtle within-cluster contrasts rather than big, new directions.
Test dual combo biplot on diverse arrows space
In this example, the individual demographic arrows are more spread out, so the meta-PCs should be more aligned with the individual arrows. There are 44 demographic variables.
Running bootstrap filter on individual arrows...
..........
Filtered to 19 arrows.
Results
As expected, the bootstrap filtered out least important demographics, only kept 19 out of 45, and the meta-PC arrows are more easily interpretable because they follow the same directionality as the individual arrows (this is possible because the individual arrows already traverse the whole space, as opposed to the constrained example above).